YAC - A Recursive Chunker for Unrestricted German Text

نویسندگان

  • Hannah Kermes
  • Stefan Evert
چکیده

YAC is a fully automatic recursive chunker for unrestricted German text. It is especially designed to provide a useful basis for the extraction of linguistic as well as lexicographic information. Consequently, the grammar rules of YAC are implemented such as to make the resulting analysis meet the needs of an ensuing extraction process. The chunks provided by YAC are continuous parts of intra-clausal constituents including recursion but no PP-attachment or sentential elements. The chunks are additionally enriched with information about head lemma, morpho-syntactic features and certain lexical and structural properties.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust German Noun Chunking With a Probabilistic Context-Free Grammar

We present a noun chunker for German which is based on a head-lexicalised probabilistic contextfree grammar. A manually developed grammar was semi-automatically extended with robustness rules in order to allow parsing of unrestricted text. The model parameters were learned from unlabelled training data by a probabilistic context-free parser. For extracting noun chunks, the parser generates all ...

متن کامل

Annotation , storage , and retrieval of mildly recursive structures

This paper describes an unusual approach to the partial syntactic analysis of unrestricted German text. Unlike most other chunk parsers, which are specially designed and implemented for the single purpose of annotating syntactic structures, YAC is the result of a slow evolution from on-line to off-line analysis. As we formulated increasingly complex queries in the CQP query language, some of wh...

متن کامل

Alignment-Guided Chunking

We introduce an adaptable monolingual chunking approach–AlignmentGuided Chunking (AGC)–which makes use of knowledge of word alignments acquired from bilingual corpora. Our approach is motivated by the observation that a sentence should be chunked differently depending the foreseen end-tasks. For example, given the different requirements of translation into (say) French and German, it is inappro...

متن کامل

An Affinity Based Greedy Approach towards Chunking for Indian Languages

A robust chunker can drastically reduce the complexity of parsing of natural language text. Chunking for Indian languages require a novel approach because of the relatively unrestricted order of words within a word group. A computational framework for chunking based on valency theory and feature structures has been described here. The paper also draws an analogy of chunk formation in free word ...

متن کامل

Analysis of German Patent Literature

We show how several components of the JET natural language analysis tool, originally developed at New York University for the analysis of English text, were adapted to German. These components, such as the part of speech tagger and the noun chunker, are explained in terms that should be understandable to a layman. On the other hand, issues that arise speci cally with regards to the German langu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002